mem: Add faster tiered buffer pool#8775
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #8775 +/- ##
==========================================
- Coverage 83.22% 80.66% -2.56%
==========================================
Files 418 416 -2
Lines 32385 33495 +1110
==========================================
+ Hits 26952 27019 +67
- Misses 4050 4663 +613
- Partials 1383 1813 +430
🚀 New features to boost your workflow:
|
bdccc9b to
1f26e66
Compare
easwars
left a comment
There was a problem hiding this comment.
I thought we decided to make it possible for the user to set the default buffer pool for the whole process through an experimental API, and get rid of the existing dial option and the server option. Was your plan to do that in a follow-up?
I was waiting for the author of #8770 to raise a PR for exposing a function to set the default buffer pool. See the second part of #8770 (comment).
I'm not sure if we want to do this. Maybe people want to use different buffer pools for each channel. In this PR, I'm just improving the existing buffer pool implementation. |
b50277e to
8e7ecc4
Compare
8e7ecc4 to
36d34f7
Compare
|
Also, would it make sense to add some of the micro benchmarks that you used as part of this PR? Thanks. |
3f63cea to
e467b97
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a new binaryTieredBufferPool which provides a significant performance improvement for buffer pool lookups by using power-of-2 tier sizes for O(1) lookups. The implementation is clever, leveraging math/bits for efficient calculations. The accompanying tests are thorough, including architecture-specific checks and benchmarks that demonstrate the performance gains. I've found a minor issue in one of the new benchmark tests that could lead to a panic. Overall, this is an excellent contribution that improves performance and is well-implemented.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
1502569 to
f32f229
Compare
mem/buffer_pool.go
Outdated
| var defaultBufferPoolSizeExponents = []uint8{ | ||
| 8, | ||
| goPageSizeExponent, | ||
| 14, // 16KB (max HTTP/2 frame size used by gRPC) | ||
| 15, // 32KB (default buffer size for io.Copy) | ||
| 20, // 1MB | ||
| } | ||
|
|
||
| var defaultBufferPool BufferPool | ||
| var ( | ||
| defaultBufferPool BufferPool | ||
| uintSize = bits.UintSize // use a variable for mocking during tests. | ||
| ) |
There was a problem hiding this comment.
Nit: Group all these vars in one block, or don't have a block at all (like you currently do for consts).
There was a problem hiding this comment.
Grouped all the vars and consts in a single block.
mem/buffer_pool.go
Outdated
| // Allocating slices of size > 2^maxExponent isn't possible on | ||
| // maxExponent-bit machines. | ||
| if int(exp) > maxExponent { | ||
| return nil, fmt.Errorf("allocating slice of size 2^%d is not possible", exp) |
There was a problem hiding this comment.
Nit: prefix the error string with package name?
There was a problem hiding this comment.
Added the prefix.
| } | ||
|
|
||
| func (b *binaryTieredBufferPool) Put(buf *[]byte) { | ||
| b.poolForPut(cap(*buf)).Put(buf) |
There was a problem hiding this comment.
Can we add a comment here capturing the subtle but important fact that we are passing the capacity of the buffer, and not the size of the buffer to poolForPut. If we did the latter, all buffers would eventually move to the smallest pool.
mem/buffer_pool_internal_test.go
Outdated
| wantErr bool | ||
| }{ | ||
| { | ||
| name: "32-bit valid exponent", |
There was a problem hiding this comment.
Nit: Make subtest names more like identifiers. go/go-style/decisions#subtest-names
mem/buffer_pool_internal_test.go
Outdated
| t.Errorf("NewBinaryTieredBufferPool() error = %t, wantErr %t", err, tt.wantErr) | ||
| return | ||
| } | ||
| if err == nil { |
There was a problem hiding this comment.
Nit: invert the conditional and return early for tests where an error was expected and was seen.
| for b.Loop() { | ||
| for size := range 1 << 19 { | ||
| _ = p.poolForGet(size) | ||
| _ = p.poolForPut(size) |
There was a problem hiding this comment.
Does it matter that we are not passing capacity here to poolForPut?
There was a problem hiding this comment.
It is true that this benchmark doesn't use the buffer's expected capacity, but the results should still be similar. The benchmark intentionally avoids fetching a buffer to ensure we measure only the buffer overhead, excluding allocation time. While we could determine the expected capacity by type-asserting the pool to sizedBufferPool and reading its defaultSize field, this would make the benchmark brittle by relying on a private implementation. Therefore, I am not making the change at this time.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a new binaryTieredBufferPool which provides O(1) lookup time for buffer pools by using power-of-2 tier sizes. The implementation is well-structured, includes comprehensive tests, and the benchmarks demonstrate a significant performance improvement over the existing tieredBufferPool. I have one suggestion to handle duplicate exponents in the pool configuration to prevent potential resource leaks.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
This change adds a new tiered buffer pool that uses power-of-2 tier
sizes. It reduces the lookup time for the relevant sizedBufferPool from
$O(\log n)$ to $O(1)$, where n is the number of tiers. This creates
constant-time lookups independent of the tier count, allowing users to
add more tiers without performance overhead.
## Benchmarks
Micro-benchmark that measures only the pool query performance, ignoring
the allocation time:
```go
func BenchmarkSearch(b *testing.B) {
defaultBufferPoolSizes := make([]int, len(defaultBufferPoolSizeExponents))
for i, exp := range defaultBufferPoolSizeExponents {
defaultBufferPoolSizes[i] = 1 << exp
}
b.Run("pool=Tiered", func(b *testing.B) {
p := NewTieredBufferPool(defaultBufferPoolSizes...).(*tieredBufferPool)
for b.Loop() {
for size := range 1 << 19 {
// One for get, one for put.
_ = p.getPool(size)
_ = p.getPool(size)
}
}
})
b.Run("pool=BinaryTiered", func(b *testing.B) {
p := NewBinaryTieredBufferPool(defaultBufferPoolSizeExponents...).(*binaryTieredBufferPool)
for b.Loop() {
for size := range 1 << 19 {
_ = p.poolForGet(size)
_ = p.poolForPut(size)
}
}
})
}
```
With 5 tiers:
```sh
go test -bench=BenchmarkSearch -count=10 -benchmem | benchstat -col '/pool' -
goos: linux
goarch: amd64
pkg: google.golang.org/grpc/mem
cpu: Intel(R) Xeon(R) CPU @ 2.60GHz
│ Tiered │ BinaryTiered │
│ sec/op │ sec/op vs base │
Search-48 5.353m ± 2% 2.036m ± 0% -61.97% (p=0.000 n=10)
│ Tiered │ BinaryTiered │
│ B/op │ B/op vs base │
Search-48 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹
¹ all samples are equal
│ Tiered │ BinaryTiered │
│ allocs/op │ allocs/op vs base │
Search-48 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹
¹ all samples are equal
```
With 9 tiers:
```sh
go test -bench=BenchmarkSearch -count=10 -benchmem | benchstat -col '/pool' -
goos: linux
goarch: amd64
pkg: google.golang.org/grpc/mem
cpu: Intel(R) Xeon(R) CPU @ 2.60GHz
│ Tiered │ BinaryTiered │
│ sec/op │ sec/op vs base │
Search-48 5.659m ± 0% 2.035m ± 0% -64.04% (p=0.000 n=10)
│ Tiered │ BinaryTiered │
│ B/op │ B/op vs base │
Search-48 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹
¹ all samples are equal
│ Tiered │ BinaryTiered │
│ allocs/op │ allocs/op vs base │
Search-48 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹
¹ all samples are equal
```
RELEASE NOTES:
* mem: Add faster tiered buffer pool. Use `NewBinaryTieredBufferPool` to
create such pools.
---------
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Very much agree with this. As a user, I'd prefer to not have any global state at all. I'd prefer all things to be tied to a client or to a server. I can share what I want to share across instances of these by myself. This is a more flexible approach. |
This change adds a new tiered buffer pool that uses power-of-2 tier sizes. It reduces the lookup time for the relevant sizedBufferPool from$O(\log n)$ to $O(1)$ , where n is the number of tiers. This creates constant-time lookups independent of the tier count, allowing users to add more tiers without performance overhead.
Benchmarks
Micro-benchmark that measures only the pool query performance, ignoring the allocation time:
With 5 tiers:
With 9 tiers:
RELEASE NOTES:
NewBinaryTieredBufferPoolto create such pools.